NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

FluidFaaS: A Dynamic Pipelined Solution for Serverless Computing with Strong Isolation-based GPU Sharing

Hui, Xinning Hui; Xu, Yuanchao; Shen, Xipeng (July 2025, The ACM 33rd International Symposium on High-Performance Parallel and Distributed Computing)

Free, publicly-accessible full text available July 20, 2026
Exploring Function Granularity for Serverless Machine Learning Application with GPU Sharing

https://doi.org/10.1145/3711699

Hui, Xinning; Xu, Yuanchao; Shen, Xipeng (March 2025, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

Recent years have witnessed increasing interest in machine learning (ML) inferences on serverless computing due to its auto-scaling and cost-effective properties. However, one critical aspect, function granularity, has been largely overlooked, limiting the potential of serverless ML. This paper explores the impact of function granularity on serverless ML, revealing its important effects on the SLO hit rates and resource costs of serverless applications. It further proposes adaptive granularity as an approach to addressing the phenomenon that no single granularity fits all applications and situations. It explores three predictive models and presents programming tools and runtime extensions to facilitate the integration of adaptive granularity into existing serverless platforms. Experiments show adaptive granularity produces up to a 29.2% improvement in SLO hit rates and up to a 24.6% reduction in resource costs over the state-of-the-art serverless ML which uses fixed granularity.
more » « less
Free, publicly-accessible full text available March 6, 2026
MC3: Memory Contention-based Covert Channel Communication on Shared DRAM System-on-Chips

Dagli, Ismet; Crea, James; Seckiner, Soner; Xu, Yuanchao; Köse, Selçuk; Belviranli, Mehmet (March 2025, Design, Automation and Test in Europe Conference 2025)

Shared memory system-on-chips (SM-SoCs) are ubiquitously employed by a wide range of computing platforms, including edge/IoT devices, autonomous systems, and smartphones. In SM-SoCs, system-wide shared memory enables a convenient and cost-effective mechanism for making data accessible across dozens of processing units (PUs), such as CPU cores and domain-specific accelerators. Due to the diverse computational characteristics of the PUs they embed, SM-SoCs often do not employ a shared last-level cache (LLC). Although covert channel attacks have been widely studied in shared memory systems, high-throughput communication has previously been feasible only by relying on an LLC or by possessing privileged or physical access to the shared memory subsystem. In this study, we introduce a new memory-contention-based covert communication attack, MC3, which specifically targets shared system memory in mobile SoCs. Unlike existing attacks, our approach achieves high-throughput communication without the need for an LLC or elevated access to the system. We explore the effectiveness of our methodology by demonstrating the trade-off between the channel transmission rate and the robustness of the communication. We evaluate MC3 on NVIDIA Orin AGX, NX, and Nano platforms and achieve transmission rates up to 6.4 Kbps with less than 1% error rate.
more » « less
Free, publicly-accessible full text available March 31, 2026
MC3: Memory Contention-based Covert Channel Communication on Shared DRAM System-on-Chips

Dagli, Ismet; Crea, James; Seckiner, Soner; Xu, Yuanchao; Köse, Selçuk; Belviranli, Mehmet (March 2025, 2025 Design Automation and Test in Europe)

Shared memory system-on-chips (SM-SoCs) are ubiquitously employed by a wide range of computing platforms, including edge/IoT devices, autonomous systems, and smartphones. In SM-SoCs, system-wide shared memory enables a convenient and cost-effective mechanism for making data accessible across dozens of processing units (PUs), such as CPU cores and domain-specific accelerators. Due to the diverse computational characteristics of the PUs they embed, SM-SoCs often do not employ a shared last-level cache (LLC). Although covert channel attacks have been widely studied in shared memory systems, high-throughput communication has previously been feasible only by relying on an LLC or by possessing privileged or physical access to the shared memory subsystem. In this study, we introduce a new memory-contention-based covert communication attack, MC3, which specifically targets shared system memory in mobile SoCs. Unlike existing attacks, our approach achieves high-throughput communication without the need for an LLC or elevated access to the system. We explore the effectiveness of our methodology by demonstrating the trade-off between the channel transmission rate and the robustness of the communication. We evaluate MC3 on NVIDIA Orin AGX, NX, and Nano platforms and achieve transmission rates up to 6.4 Kbps with less than 1% error rate.
more » « less
Free, publicly-accessible full text available March 31, 2026
Outback: Fast and Communication-Efficient Index for Key-Value Store on Disaggregated Memory

https://doi.org/10.14778/3705829.3705849

Liu, Yi; Xie, Minghao; Shi, Shouqian; Xu, Yuanchao; Litz, Heiner; Qian, Chen (October 2024, Proceedings of the VLDB Endowment)

Disaggregated memory systems achieve resource utilization efficiency and system scalability by distributing computation and memory resources into distinct pools of nodes. RDMA is an attractive solution to support high-throughput communication between different disaggregated resource pools. However, existing RDMA solutions face a dilemma: one-sided RDMA completely bypasses computation at memory nodes, but its communication takes multiple round trips; two-sided RDMA achieves one-round-trip communication but requires non-trivial computation for index lookups at memory nodes, which violates the principle of disaggregated memory. This work presents Outback, a novel indexing solution for key-value stores with a one-round-trip RDMA-based network that does not incur computation-heavy tasks at memory nodes. Outback is the first to utilize dynamic minimal perfect hashing and separates its index into two components: one memory-efficient and compute-heavy component at compute nodes and the other memory-heavy and compute-efficient component at memory nodes. We implement a prototype of Outback and evaluate its performance in a public cloud. The experimental results show that Outback achieves higher throughput than both the state-of-the-art one-sided RDMA and two-sided RDMA-based in-memory KVS by 1.06--5.03×, due to the unique strength of applying a separated perfect hashing index.
more » « less
Full Text Available
ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

https://doi.org/10.1145/3625549.3658657

Hui, Xinning; Xu, Yuanchao; Guo, Zhishan; Shen, Xipeng (June 2024, ACM)

Full Text Available
Data Enclave: A Data-Centric Trusted Execution Environment

https://doi.org/10.1109/HPCA57654.2024.00026

Xu, Yuanchao; Pangia, James; Ye, Chencheng; Solihin, Yan; Shen, Xipeng (March 2024, IEEE)

Full Text Available
Reconciling Selective Logging and Hardware Persistent Memory Transaction

https://doi.org/10.1109/HPCA56546.2023.10071088

Ye, Chencheng; Xu, Yuanchao; Shen, Xipeng; Sha, Yan; Liao, Xiaofei; Jin, Hai; Solihin, Yan (February 2023, IEEE International Symposium on High-Performance Computer Architecture)

Full Text Available
SpecPMT: Speculative Logging for Resolving Crash Consistency Overhead of Persistent Memory

https://doi.org/10.1145/3575693.3575696

Ye, Chencheng; Xu, Yuanchao; Shen, Xipeng; Sha, Yan; Liao, Xiaofei; Jin, Hai; Solihin, Yan (January 2023, ACM International Conference on Architectural Support for Programming Languages and Operating Systems)

Full Text Available
FFCCD: fence-free crash-consistent concurrent defragmentation for persistent memory

https://doi.org/10.1145/3470496.3527406

Xu, Yuanchao; Ye, Chencheng; Solihin, Yan; Shen, Xipeng (June 2022, The 49th International Symposium on Computer Architecture (ISCA))

Full Text Available

« Prev Next »

Search for: All records